47 research outputs found

    Kreuzvalidierung angewandt auf Approxiamte Bayesian Computation

    Get PDF
    Approximate Bayesian Computation (ABC) ist eine moderne Technik zur Simulation der a-posteriori-Verteilung, wenn die Likelihood nicht analytisch bestimmbar ist. Anwendung findet ABC derzeit vor allem in der Populationsgenetik. Eine wichtige und noch nicht ausreichend beantwortete Frage in der Anwendung von ABC ist, wie die Akzeptanzschwelle in der Simulation der a-posteriori-Verteilung gewĂ€hlt werden soll. In dieser Arbeit wird ĂŒberprĂŒft, ob Kreuzvalidierung ein Werkzeug dafĂŒr sein kann, die Akzeptanzschwelle auszuwĂ€hlen

    Deficiency of nucleotide excision repair is associated with mutational signature observed in cancer

    Get PDF
    Nucleotide excision repair (NER) is one of the main DNA repair pathways that protect cells against genomic damage. Disruption of this pathway can contribute to the development of cancer and accelerate aging. Mutational characteristics of NER-deficiency may reveal important diagnostic opportunities, as tumors deficient in NER are more sensitive to certain treatments. Here, we analyzed the genome-wide somatic mutational profiles of adult stem cells (ASCs) from NER-deficient Ercc1−/Δ mice. Our results indicate that NER-deficiency increases the base substitution load twofold in liver but not in small intestinal ASCs, which coincides with the tissue-specific aging pathology observed in these mice. Moreover, NER-deficient ASCs of both tissues show an increased contribution of Signature 8 mutations, which is a mutational pattern with unknown etiology that is recurrently observed in various cancer types. The scattered genomic distribution of the base substitutions indicates that deficiency of global-genome NER (GG-NER) underlies the observed mutational consequences. In line with this, we observe increased Signature 8 mutations in a GG-NER-deficient human organoid culture, in which XPC was deleted using CRISPR-Cas9 gene-editing. Furthermore, genomes of NER-deficient breast tumors show an increased contribution of Signature 8 mutations compared with NER-proficient tumors. Elevated levels of Signature 8 mutations could therefore contribute to a predictor of NER-deficiency based on a patient's mutational profile

    Pathway and network analysis of more than 2500 whole cancer genomes.

    Get PDF
    The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments

    Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis.

    Get PDF
    Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis

    Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.

    Get PDF
    The discovery of drivers of cancer has traditionally focused on protein-coding genes1-4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available

    An approximate maximum likelihood algorithm with case studies

    No full text
    Die Likelihood-Funktion ist die Basis vieler statistischer SchĂ€tzmethoden, sowohl in der Bayesianischen wie auch in der klassischen Statistik. In vielen Anwendungsbereichen jedoch werden komplexe stochastische Modelle verwendet, fĂŒr die die Likelihood nicht analytisch hergeleitet werden kann. Beispiele dafĂŒr sind stochastische Modelle in der Populationsgenetik, Systembiologie, Epidemiologie, Warteschlangentheorie und Spatial Statistics. Immer schnellere Computer fĂŒhrten in den letzten Jahren zur Entwicklung alternativer SchĂ€tzmethoden, die auf Simulationen basieren, wie zum Beispiel Indirect Inference und Approximate Bayesian Computation. In dieser Dissertation wird ein alternativer Algorithmus vorgeschlagen und untersucht, der den Maximum-Likelihood-SchĂ€tzer mit Hilfe von stochastischen Gradientenmethoden approximiert. Dabei werden die Anstiegsrichtungen durch Simulationen ermittelt. Der Algorithmus konvergiert gegen den Maximum-Likelihood-SchĂ€tzer (bzw. Ă€quivalent dazu, gegen das Maximum der A-posteriori-Verteilung). Damit wird die Anzahl der Simulationen in Regionen des Parameterraumes mit sehr niedriger Likelihood reduziert. Außerdem ist der Algorithmus flexibel auf verschiedenste Modelle anwendbar. Es werden Bedingungen hergeleitet, unter denen der approximative Maximum-Likelihood-Algorithmus fast sicher gegen den Maximum-Likelihood-SchĂ€tzer konvergiert. Weiters wird die praktische Anwendbarkeit des Algorithmus untersucht. ZunĂ€chst wird der Algorithmus um Maßnahmen, die die Robustheit der Methode erhöhen, ergĂ€nzt. Nach ersten Untersuchungen der Eigenschaften des approximativen Maximum-Likelihood-SchĂ€tzers anhand von normalverteilten Daten wird er an zwei Beispiel\-en mit komplexen Likelihood-Funktionen angewandt. Das erste ist eine Anwendung zur ParameterschĂ€tzung eines Warteschlangenprozesses. Zweitens wird der Algorithmus dazu verwendet die evolutionĂ€re Geschichte der Orang-Utan-Populationen aus Borneo und Sumatra zu rekonstruieren.The likelihood function is the basis of many statistical inference procedures. However, in various areas like population genetics, systems biology, epidemiology, queuing systems and spatial statistics, complex statistical models are required for which the likelihood cannot be obtained analytically. In recent years, increasing computing power has allowed to circumvent this problem by simulation-based methods like Indirect Inference and Approximate Bayesian Computation. %In its most basic form, ABC involves sampling from the parameter space and keeping those parameters that produce data that fit sufficiently well to the actually observed data. Exploring the whole parameter space, however, makes this approach inefficient in high dimensional problems. This led to the proposal of more sophisticated iterative methods of inference such as particle filters. Here, we propose an alternative approach that is based on stochastic gradient methods. By moving along a simulated gradient, the algorithm produces a sequence of estimates that will eventually converge to the maximum likelihood estimate (or, equivalently, to the maximum of the posterior). This approach reduces the number of simulations in regions of low likelihood while being flexibly applicable to a large variety of problems. We present a set of conditions under which the algorithm converges to the maximum likelihood estimate \textit{w. p. 1} and we also explore the properties of the resulting estimator in practical applications. To this end we first propose a set of tuning guidelines that improve the robustness of the algorithm against too noisy simulation results. Then, we investigate the performance of our approach in simulation studies and apply our algorithm to two models with intractable likelihood functions. First, we present an application in the context of queuing systems. Second, we re-analyse population genetic data and estimate parameters describing the demographic history of Bornean and Sumatran orang-utan populations

    Can secondary contact following range expansion be distinguished from barriers to gene flow?

    Get PDF
    International audienceSecondary contact is the reestablishment of gene flow between sister populations that have diverged. For instance, at the end of the Quaternary glaciations in Europe, secondary contact occurred during the northward expansion of the populations which had found refugia in the southern peninsulas. With the advent of multi-locus markers, secondary contact can be investigated using various molecular signatures including gradients of allele frequency, admixture clines, and local increase of genetic differentiation. We use coalescent simulations to investigate if molecular data provide enough information to distinguish between secondary contact following range expansion and an alternative evolutionary scenario consisting of a barrier to gene flow in an isolation-by-distance model. We find that an excess of linkage disequilibrium and of genetic diversity at the suture zone is a unique signature of secondary contact. We also find that the directionality index ψ, which was proposed to study range expansion, is informative to distinguish between the two hypotheses. However, although evidence for secondary contact is usually conveyed by statistics related to admixture coefficients, we find that they can be confounded by isolation-by-distance. We recommend to account for the spatial repartition of individuals when investigating secondary contact in order to better reflect the complex spatio-temporal evolution of populations and species
    corecore